智能论文笔记

Deforming Radiance Fields with Cages

Tianhan Xu , Tatsuya Harada

分类：计算机视觉

2022-07-25

辐射场的最新进展可以使静态或动态3D场景的影照相渲染，但仍然不支持用于场景操纵或动画的显式变形。在本文中，我们提出了一种可以实现辐射场的新类型变形的方法：自由形式的辐射场磁场变形。我们使用一个三角形的网格，该网格封闭了称为笼子作为接口的前景对象，通过操纵笼顶点，我们的方法可以使辐射场的自由形式变形。我们方法的核心是基于笼子的变形，通常用于网格变形。我们提出了一种新颖的公式，以将其扩展到辐射场，该公式将采样点的位置和视图方向映射到从变形空间到规范空间，从而实现了变形场景的渲染。合成数据集和现实世界数据集的变形结果证明了我们方法的有效性。

translated by 谷歌翻译

A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

Tianhan Zhang , Yuxiao Yi , Yifan Xu , Zhi X. Chen , Yaoyu Zhang , Weinan E , Zhi-Qin John Xu

分类：机器学习

2022-01-09

由于极大数量的参数和评估标准和再现性，机器学习长期以来被视为黑盒子，用于预测燃烧化学动力学和缺乏评估标准和再现性。目前的工作旨在了解关于深度神经网络（DNN）方法的两个基本问题：DNN需要的数据以及DNN方法的一般数据。采样和预处理确定DNN训练数据集，进一步影响DNN预测能力。目前的工作建议使用Box-Cox转换（BCT）来预处理燃烧数据。此外，这项工作比较了在没有预处理的情况下进行了不同的采样方法，包括蒙特卡罗方法，歧管采样，生成神经网络方法（Cycle-GaN）和新提出的多尺度采样。我们的研究结果表明，通过歧管数据训练的DNN可以以有限的配置捕获化学动力学，但不能对扰动牢固，这对于与流场联系的DNN是不可避免的。蒙特卡罗和循环甘套采样可以覆盖更宽的相位空间，但不能捕获小规模的中间物种，产生差的预测结果。基于没有特定火焰仿真数据的多尺度方法的三层DNN，允许在各种场景中预测化学动力学并在时间的演变期间保持稳定。该单个DNN易于用几个CFD代码实现并在各种燃烧器中验证，包括（1）。零维自动化，（2）。一维自由传播火焰，（3）。具有三重火焰结构的二维喷射火焰，和（4）。三维湍流升降火焰。结果证明了预先训练的DNN的令人满意的准确性和泛化能力。 DNN和示例代码的FORTRAN和PYTHON版本在补充中附加了再现性。

translated by 谷歌翻译

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics

Zhiwei Wang , Yaoyu Zhang , Yiguang Ju , Weinan E , Zhi-Qin John Xu , Tianhan Zhang

分类：机器学习

2022-01-06

提出了一种基于深度学习的模型减少（DeepMR）用于简化化学动力学的方法，并使用高温自动点火，完全搅拌反应器（PSR）和一维自由传播的正庚烷/空气混合物的一致性。减少机制被建模为布尔空间的优化问题，其中布尔向量，与物种对应的每个条目表示减少的机制。优化目标是最小化给定考虑到一组预选的基准量的误差的机制尺寸。 DeepMR的关键思想是使用深度神经网络（DNN）来制定优化问题中的目标函数。为了有效地探索高维布尔空间，实现了一种迭代的DNN辅助数据采样和DNN训练过程。结果表明，DNN辅助显着提高了采样效率，仅为10 ^ {34}美元的样本中选择了10 ^ 5美元的样品，以实现足够的准确性。结果证明了DNN识别关键物种的能力，合理预测机制性能降低。训练有素的DNN通过解决反向优化问题，保证了最佳减少的机制。通过比较点火延迟时间，Laminar火焰速度，PSR的温度，得到的骨骼机制具有更少的物种（45种），但与通过路径通量分析（PFA）方法获得的骨骼机制（56种）相同的精度水平。另外，如果仅考虑大气，近化学计量条件（0.6和1.2之间的等效比），则骨骼机构可以进一步减少到28种。 DeepMR提供了一种进行模型减少的创新方法，并演示了燃烧区域中数据驱动方法的巨大潜力。

translated by 谷歌翻译

Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis

Tianhan Xu , Yasuhiro Fujita , Eiichi Matsumoto

分类：计算机视觉

2022-01-05

我们提出了一种从稀疏多视图RGB视频重建可控隐式3D人类模型的新方法。我们的方法在网格表面点上定义神经场景表示，并从人体网格的表面签名距离。我们识别出一种无法区分的问题，当3D空间中的点映射到其最近的网格上的最近的表面点时出现的问题，用于学习表面对齐的神经场景表示。要解决此问题，我们将使用与修改的顶点正常的重心插值提出将点投影到网状表面上。与Zju-Mocap和Human3.6m数据集的实验表明，我们的方法在比现有方法的新颖性和新型姿态合成中实现了更高的质量。我们还表明，我们的方法很容易支持身体形状和衣服的控制。

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

AI in HCI Design and User Experience

Wei Xu

分类：人工智能

2023-01-03

In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译

Surveillance Face Anti-spoofing

Hao Fang , Ajian Liu , Jun Wan , Sergio Escalera , Chenxu Zhao , Xu Zhang , Stan Z. Li , Zhen Lei

分类：计算机视觉

2023-01-03

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

translated by 谷歌翻译

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

Xu Yan , Chaoda Zheng , Zhen Li , Shuguang Cui , Dengxin Dai

分类：计算机视觉

2023-01-03

When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.

translated by 谷歌翻译

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

Xiangtai Li , Shilin Xu , Yibo Yang , Haobo Yuan , Guangliang Cheng , Yunhai Tong , Zhouchen Lin , Dacheng Tao

分类：计算机视觉

2023-01-03

Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.

translated by 谷歌翻译